2 research outputs found

    Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceImbalanced datasets in supervised learning are considered an ongoing challenging task for standard algorithms, seeing as they are designed to handle balanced class distributions and perform poorly when applied to problems of the imbalanced nature. Many methods have been developed to address this specific problem but the more general approach to achieve a balanced class distribution is data level modification, instead of algorithm modifications. Although class imbalances are responsible for significant losses of performance in standard classifiers in many different types of problems, another aspect that is important to consider is the small disjuncts problem. Therefore, it is important to consider and understand solutions that not only take into the account the between-class imbalance (the imbalance occurring between the two classes) but also the within-class imbalance (the imbalance occurring between the sub-clusters of each class) and to oversample the dataset by rectifying these two types of imbalances simultaneously. It has been shown that cluster-based oversampling is a robust solution that takes into consideration these two problems. This work sets out to study the effect and impact combining different existing oversampling methods with a clustering-based approach. Empirical results of extensive experiments show that the combinations of different oversampling techniques with the clustering algorithm k-means – K-Means Oversampling - improves upon classification results resulting solely from the oversampling techniques with no prior clustering step
    corecore